Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(specs): implement sample method #116

Closed
wants to merge 3 commits into from

Conversation

aar65537
Copy link
Contributor

@aar65537 aar65537 commented Apr 8, 2023

Closes #103

  • Implements gym style sample method for each Spec
  • Implements unit tests for each sample method
  • Modifies smoke test to sample action from spec
  • Does not remove generate_value method although possibly redundant

@CLAassistant
Copy link

CLAassistant commented Apr 8, 2023

CLA assistant check
All committers have signed the CLA.

@sash-a
Copy link
Collaborator

sash-a commented Jan 12, 2024

Hi @aar65537, apologies for the late response and thanks very much for the contribution! Had a chat with the team and we can't merge the PR as it is now. Problem is it's not guaranteed to generate valid observations with the sample method (this is what the generators are for) and while it will generate legal actions and thus it would be useful for e-greedy type exploration, we can't have it work for action specs but not for observation specs.

Honestly I'm not sure if there is a clean solution for this problem and as such I'm going to close this PR, but if you do think of something please comment on this/open a new PR/issue.

@sash-a sash-a closed this Jan 12, 2024
@aar65537
Copy link
Contributor Author

Thank you for the response @sash-a. In my mind, env.observation_spec().sample generating invalid observations is correct behavior because you are generating a random Pytree that conforms to the spec, not necessarily a valid observation. While I understand how this might be confusing for users, I still think it would be advantageous to have a standard way to generate samples from a spec.

Perhaps, it would be best to include sample as a private method (_sample) and make it accessible to users through a sample_action method on the Environment class.

@carlosgmartin
Copy link

@sash-a I second the suggestion of @aar65537. Random sampling of actions is a very useful feature to have.

@sash-a
Copy link
Collaborator

sash-a commented Jan 29, 2024

I'm still not 100% convinced by this. I agree that it seems like a nice feature, however the useful part of it is to generate actions for e-greedy like exploration (I don't see another use as we have the generate_value method, let me know if I'm missing something though). The problem with this method for generating those actions is that you're almost always using the vmap wrapper (or directly vmapping step) and as such you won't be able to generate an action of the correct shape as you'll be missing the vmap dimension in the spec. I understand this is how gym does it and it's useful there, however I'm still not convinced this is useful for Jumanji.

I do get that generating illegal states is acceptable behaviour as I think this is how gym specs work so being consistent with those isn't a bad idea, even though the observations would be technically invalid. This was mostly @clement-bonnet's issues with the PR so would be good to get your views on this 😁

@aar65537
Copy link
Contributor Author

e-greedy exploration is probably the main use case. However, a sample_action method would also be useful for smoke testing and possibly benchmarking at some point. In regards to working with vmap, you should be able to just vmap the the sample method. This could even be integrated into the VmapWrapper, so wrapping an environment automatically vmaps its sample_action method.

@sash-a
Copy link
Collaborator

sash-a commented Jan 31, 2024

The vmapping becomes slightly tricky because there is nothing to vmap over as the sample method takes no arguments. I'll try ping @clement-bonnet and see if he has any further opinions 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat(specs): implement sample method
5 participants